Efficient DNN Model for Word Lip-Reading

نویسندگان

چکیده

This paper studies various deep learning models for word-level lip-reading technology, one of the tasks in supervised video classification. Several public datasets have been published research field. However, few investigated techniques using multiple datasets. evaluates four publicly available datasets, namely Lip Reading Wild (LRW), OuluVS, CUAVE, and Speech Scene by Smart Device (SSSD), which are representative this LRW is large-scale targets 500 English words released 2016. Initially, recognition accuracy was 66.1%, but many groups working on it. The current state art (SOTA) has achieved 94.1% 3D-Conv + ResNet18 {DC-TCN, MS-TCN, BGRU} knowledge distillation word boundary. Regarding SOTA model, paper, we combine existing such as ResNet, WideResNet, EfficientNet, Transformer, ViT, ViViT, investigate effective six with modified feature extractors classifiers. Through experiments, show that similar model structures extraction MS-TCN inference valid different scales.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient face model for lip reading

There is number of researches on the lip reading. However, there is little discussion about which face model is effect for lip reading. This paper builds various face models which changes the combination of a face part, and changes the feature points. Various experiments were conducted on the conditions which change only model and do not change other algorithms. We apply the active appearance m...

متن کامل

An Efficient Lip-reading Method Using K-nearest Neighbor Algorithm

Many studies have been carried out on lip reading, most of those works are based on color images, while some essential features might not be obtained, like inner lip information. In this paper, RGBD camera will be introduced for improving the recognition rate of lip reading. We try to complete lip reading through using only gray-scale images. Thirteen groups of words are given, and we present e...

متن کامل

Lip-reading and Bilingualism

This study investigated whether observers can identify what language was being spoken in visual-only speech stimuli, and whether or not this ability depends on an observers’ prior linguistic experience. Participants watched visual-only speech stimuli and were asked to decide if the talker in the video was speaking English or Spanish. Four groups of participants were studied: monolinguals and bi...

متن کامل

Lip Reading in Profile

There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is...

متن کامل

Towards Unrestricted Lip Reading

Lip reading provides useful information in speech perception and language understanding, especially when the auditory speech is degraded. However, many current automatic lip reading systems impose some restrictions on users. In this paper, we present our research e orts, in the Interactive System Laboratory, towards unrestricted lip reading. We rst introduce a top-down approach to automatically...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2023

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a16060269